REPORT  DOCUMENTATION  PAGE 


Form  Approved  OMB  NO.  0704-0188 


The  public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions, 
searching  existing  data  sources,  gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments 
regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including  suggesstions  for  reducing  this  burden,  to  Washington 
Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington  VA,  22202-4302. 
Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  any  oenalty  for  failing  to  comply  with  a  collection 
of  information  if  it  does  not  display  a  currently  valid  OMB  control  number. 

PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ADDRESS. 


L  REPORT  DATE  (DD-MM-YYYY) 


2.  REPORT  TYPE 
Technical  Report 


4.  TITLE  AND  SUBTITLE 

Network  Evolution  by  Relevance  and  Importance  Preferential 
Attachment 


3.  DATES  COVERED  (From  -  To) 


5a.  CONTRACT  NUMBER 
W91  INF-12-1-0546 


5b.  GRANT  NUMBER 


6.  AUTHORS 
C  Lim,  Weituo  Zhang 


5c.  PROGRAM  ELEMENT  NUMBER 

611102 


5d.  PROJECT  NUMBER 


5e.  TASK  NUMBER 


5f.  WORK  UNIT  NUMBER 


7.  PERFORMING  ORGANIZATION  NAMES  AND  ADDRESSES 

Rensselaer  Polytechnic  Institute 
110  8th  Street 


8.  PERFORMING  ORGANIZATION  REPORT 
NUMBER 


Troy,  NY  12180  -3522 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS 
(ES) 

U.S.  Army  Research  Office 
P.O.Box  12211 

Research  Triangle  Park,  NC  27709-2211 


12.  DISTRIBUTION  AVAILIBILITY  STATEMENT 


10.  SPONSOR/MONITORS  ACRONYM(S) 
ARO 


11.  SPONSOR/MONITORS  REPORT 
NUMBER(S) 

62449-NS.15 


Approved  for  public  release;  distribution  is  unlimited. 


13.  SUPPLEMENTARY  NOTES 

The  views,  opinions  and/or  findings  contained  in  this  report  are  those  of  the  author(s)  and  should  not  contrued  as  an  official  Department 
of  the  Army  position,  policy  or  decision,  unless  so  designated  by  other  documentation. 


14.  ABSTRACT 

Relevance  and  importance  are  the  two  main  factors  when  people  find  or  build  network  connections.  We  propose  an 
improved  preferential  attachment  (PA)  algorithm  to  take  in  consideration  the  relevance  between  vertices  of  the 
network  measured  by  a  given  metric.  We  analyze  the  universal  properties  of  the  network  class  generalized  by  this 
algorithm  and  investigate  two  typical  cases:  scientific  citation  and  between-city  transportation.  This  is  a  brief  report 


15.  SUBJECT  TERMS 

preferential  attachment  by  importance  and  relevance 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

15.  NUMBER 

a.  REPORT 

b.  ABSTRACT 

c.  THIS  PAGE 

ABSTRACT 

OF  PAGES 

uu 

UU 

uu 

UU 

Chian  Lim 


19b.  TELEPHONE  NUMBER 
518-276-6904 


Standard  Form  298  (Rev  8/98) 
Prescribed  by  ANSI  Std.  Z39. 1 8 


Report  Title 

Network  Evolution  by  Relevance  and  Importance  Preferential  Attachment 

ABSTRACT 

Relevance  and  importance  are  the  two  main  factors  when  people  find  or  build  network  connections.  We  propose  an 
improved  preferential  attachment  (PA)  algorithm  to  take  in  consideration  the  relevance  between  vertices  of  the 
network  measured  by  a  given  metric.  We  analyze  the  universal  properties  of  the  network  class  generalized  by  this 
algorithm  and  investigate  two  typical  cases:  scientific  citation  and  between-city  transportation.  This  is  a  brief  report 
of  our  research  progress. 


Network  Evolution  by  Relevance  and  Importance 
Preferential  Attachment 

Weituo  Zhang,  Chjan  Lim 
August  6,  2014 


Abstract 

Relevance  and  importance  are  the  two  main  factors  when  people  find  or  build  net¬ 
work  connections.  We  propose  an  improved  preferential  attachment  (PA)  algorithm 
to  take  in  consideration  the  relevance  between  vertices  of  the  network  measured  by  a 
given  metric.  We  analyze  the  universal  properties  of  the  network  class  generalized  by 
this  algorithm  and  investigate  two  typical  cases:  scientific  citation  and  between-city 
transportation.  This  is  a  brief  report  of  our  research  progress. 


1  Introduction 

Relevance  and  Importance  are  the  two  main  factors  when  people  find  or  build  network 
connections.  One  scenario  is  in  the  scientific  research.  For  authors  finding  references,  the 
importance  of  the  articles  and  the  relevance  to  their  own  issues  should  be  both  considered. 
Another  scenario  is  in  the  decision  making  of  constructing  between-city  transportation.  We 
prefer  to  connect  a  city  to  other  cities  with  higher  connectivity  but  also  want  to  reduce  the 
expense  by  selecting  nearby  cities.  In  this  paper,  we  propose  an  evolutionary  network  model 
with  appealing  properties  that  takes  the  both  two  factors  into  consideration.  Our  work  is 
based  on  the  “preferential  attachment”  (PA)  algorithm  invented  by  Barabasi,  Albert.  The 
classical  preferential  attachment  starts  with  a  network  with  Nq  vertices  and  mo  edges.  New 
vertex  is  successively  added  and  attached  to  m  <  mo  preexisting  vertices.  The  probability 
of  attaching  to  a  vertex  i  is  proportional  to  its  degree  k\.  This  algorithm  will  naturally 
generate  the  network  with  power-law  degree  distribution  p(k)  ~  A:-7  with  7  =  3.  There  are 
many  variations  of  the  PA  algorithm  in  the  literature,  and  from  which  we  conclude  that  the 
preferential  attachment  to  high  degree  nodes,  i.e.  the  “rich  get  richer”  effect,  is  the  essential 
reason  for  the  emergence  of  scale  free  degree  distribution.  Besides,  we  suggest  preferential 
attachment  to  relevant  nodes,  i.e.  “connecting  to  things  nearby”  should  be  the  reason  that 
networks  have  clustering  structures.  Combining  the  both  effects,  it  is  hopeful  to  lead  to 
network  models  with  both  scale  free  and  high  clustering  properties,  and  it  is  the  motivation 
of  our  work. 

Although  there  is  no  rigorous  definition  of  complex  networks,  many  people  consider  the 
following  three  are  the  typical  properties  of  complex  networks:  power-law  degree  distribution 
(scale  free),  high  clustering  coefficient  (clustering),  short  average  path-length  (small  world). 
A  lot  of  efforts  have  been  made  to  find  network  models  which  capture  these  properties.  The 
following  table  summarizes  the  properties  of  several  known  network  models. 

Till  now,  not  many  network  models  satisfactorily  capture  all  of  the  three  typical  prop¬ 
erties.  Some  network  models  like  Random  Apollonian  Network(RAN)  do,  but  is  totally 
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V 

artificial  without  revealing  the  mechanism  from  which  all  the  properties  of  the  real  world 
networks  come.  The  RIPA  model  we  proposed  here  have  all  of  the  three  properties  under 
certain  conditions,  and  at  the  same  time  provides  a  natural  reasoning  of  these  properties. 
Further  more,  it  also  has  a  core-periphery  structure  which  is  an  important  feature  of  some 
real  world  networks  like  the  world  airline  network  (WAN). 

In  this  paper,  we  will  introduce  our  RIPA  network  model  given  by  an  evolution  process, 
analyze  several  network  properties,  and  compare  this  model  with  other  network  models  and 
some  empirical  data. 

2  Model 

In  this  section  we  will  describe  the  algorithm  called  Relevance  and  Importance  Preferential 
Attachment  (RIPA)  which  generate  a  class  of  complex  networks.  The  RIPA,  similar  to  the 
classical  preferential  attachment,  starts  with  a  initial  network  with  Ab  vertices  and  mo 
edges.  A  new  vertex  is  attached  to  m  other  vertices  with  the  probability  depending  on  the 
importance  and  relevance  of  those  vertices. 

In  RIPA,  the  importance  of  a  vertex  is  valued  by  its  degree  as  in  the  classical  preferential 
attachment.  For  the  relevance,  we  introduce  a  metric  space.  In  a  metric  space  fi,  the 
distance  between  two  elements  x,y  G  ft  is  given  by  d(x,y).  Then  their  relevance  p(x,y) 
is  defined  as  a  non-increasing  function  of  the  distance  between  them  p(x,y)  —  f(d(x,y)), 
satisfying  /( 0)  =  1  and  /( oo)  =  0.  A  typical  example  is  f(x)  =  e-x,  but  /  can  also  have  a 
power-law  tail. 

The  centrality  defined  below  measures  the  general  influence  of  an  element  x  on  the  whole 
space. 

C{pc)  —  /  p(x,x)dx 

Jn 

Centrality  actually  gives,  in  another  sense,  an  “importance”  according  to  the  position  in  the 
underlining  metric  space  instead  of  the  connectivity  to  other  vertices.  In  the  scenario  of  the 
between-city  transportation,  centrality  measures  the  physical  geographical  transportation 
condition  of  a  position.  In  the  scenario  of  scientific  research,  a  research  topic  has  high 
centrality  means  it  is  a  bridge  of  many  other  fields.  In  this  letter,  we  investigate  some 
cases  on  metric  spaces  with  constant  centrality  C{x)  =  C.  Examples  are:  (l)square  with 
periodic  boundary  condition,  (2)  sphere  in  3-d  space,  and  (3)  n-dimensional  binary  vector 
space  with  metric  induced  by  LI  norm.  In  these  spaces,  there  is  no  “center”  position  and 
every  element  is  at  an  equivalent  place. 

A  further  restriction  here  for  the  relevance  p  and  hence  /  is  that  the  integral  in  the 
definition  of  centrality  should  be  well-defined.  This  restriction  is  fairly  important  especially 
when  we  consider  the  large  network  limit. 
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In  the  RIPA,  a  new  vertex  j  is  attached  to  the  preexisting  vertex  i  by  the  probability 


1  r  _  kiPij 

Here  k\  is  the  degree  of  i  indicating  the  importance  and  pij  —  p{xi,Xj)  is  the  relevance 
between  i,  j.  z(xj )  is  the  normalization  constant  so  that  JA  H-  =  1.  z(x)  is  defined  as  a 
function  on  Q  called  local  partition  by 


z(x )  =  ^  kip(xi,  x). 
i 


The  summation  here  goes  over  all  existing  vertices.  A  particular  position  x  G  ft  with  higher 
local  partition  z{x)  has  more  overall  relevance  to  previous  vertices,  therefore  may  attract 
more  interest.  So  we  suggest  p(x),  the  probability  of  emergence  of  a  new  vertex  at  x,  is 
proportional  to  z(x) 


fi{x)  — 


z[pc) 

~~Z~' 


where  Z  is  the  global  partition  function 


Z  = 


[  z(x)dx  =  [  Va zjp[xj,x)dx 

Jn  Jn, 


J2kic(xj)- 

j 


We  summarize  the  algorithm  of  RIPA  as  follows: 


•  1.  Begin  with  a  network  with  TVo  nodes. 


•  2.  For  i  —  No  +  1  to  N 

2.1  Add  a  new  node  i  at  the  position  x  with  probability  p(x)  — 


2.2  Attach  i  to  m  preexisting  nodes  with  probability  n ^ 


kiPij 
z(Xj) ‘ 


In  a  metric  space  with  constant  centrality,  we  further  have  Z  —  KC  where  K  —  JA  k{  — 
mo  +  mt  is  the  total  number  of  degree  in  the  network  and  grows  linearly  with  time  t.  The 
expected  change  of  the  degree  of  the  vertex  i  is  given  by 


E 


dki 

dt 


In 


Hijp(xj)dxj  —  / 

Jn 


kiPij  z(xj )  _ 

z(xj)  Z  Xj ~ 


kiC(xi) 

Z 


The  above  equation  shows  that  the  degree  of  a  vertex  grows  at  a  expected  speed  propor¬ 
tional  to  the  current  degree  which  is  exactly  the  relation  we  have  in  standard  preferential 
attachment  algorithm.  So  we  also  obtain  the  power-law  degree  distribution  p{k)  ~  A;-7  with 
7  =  3.  Besides,  the  change  of  the  local  partition  z(x)  comes  from  two  parts:  the  growth  of 
degrees  of  the  existing  vertices  and  the  new  vertex.  When  the  centrality  is  constant  C,  we 
have 


E 


dz(x) 

dt 


dki 

dt 


p{xi ,  x)  +  m 


C 

—  (z(x)  +  mz(x)) . 

Zj 


/  p(x' ,x)p(x')dx' 

Jn 
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Here  z{x)  =  ^  z(x')p(x',  x)dx'  is  considered  as  an  average  of  z  in  the  neighborhood  of  x 

by  the  weight  function  p(xf  ,x).  The  above  equation  can  be  rewritten  as 


E 


dz{pc) 

dt 


C 

z 


[(m  +  1  )z(x)  +  m(z(x)  —  z(x))\  . 


On  the  right  hand  side,  the  first  term  is  respect  to  exponential  growth  tending  to  generate 
a  scale  free  distribution  of  z(x),  the  second  term  is  a  diffusion  term  which  will  smooth  the 
distribution  of  z(x). 


3  Between-city  transportation 


In  this  section  we  focus  on  RIPA  on  2-dimensional  surface  with  respect  to  the  case  of 
between-city  transportation.  First,  we  consider  networks  generated  by  RIPA  on  the  unit 
square  D  with  periodic  boundary  conditions.  The  relevance  p  is  given  by  f(x)  =  exp  (—Ax). 
In  this  case  the  total  partition  function  is: 


Z  = 


kje-Xd^’x)dx 


Figure  1  represents  a  special  realization  of  the  network.  Each  circle  in  the  figure  rep¬ 
resents  a  city,  the  center  of  the  circle  indicates  the  locations  of  the  city  and  the  radius 
indicates  the  degree,  the  color  (brightness)  in  the  background  indicates  the  logarithm  of  the 
local  partition  function  z(x).  In  Fig.l,  we  observe  a  phenomenon  that  cities  tends  to  gather 
but  big  cities  tends  to  separate.  Around  the  greatest  city  (the  capital),  we  can  find  bigger 
city  in  the  area  further  from  the  capital.  This  is  because  a  huge  city  has  two  effects:  (1) 
the  local  partition  in  its  neighbor  area  is  bigger  therefore  attract  more  new  cities,  (2) it  will 
attract  more  links  from  new  cities  therefore  inhibit  the  nearby  cities  to  grow.  The  second 
effect  is  the  most  significant  when  we  choose  small  m. 

Next,  we  will  investigate  the  properties  of  the  RIPA  network  model  one  by  one  in  this 
special  case,  and  compare  this  network  model  with  the  BA  network  and  the  world  airline 
network  (WAN).  The  later  is  an  empirical  network  from  openfights.org. 


3.1  Degree  distribution 

Fig.  2  shows  that  the  power-law  degree  distribution  of  the  RIPA  network.  As  analyzed 
before,  the  degree  distribution  is  ~  A;-7.  is  the  number  of  vertices  with  the  degree 
k.  The  index  7  =  3  as  the  same  as  in  the  BA  network  model. 


3.2  Clustering  Coefficient 

The  clustering  coefficient  quantifies  how  well  connected  are  the  neighbors  of  a  node  in  a  net¬ 
work.  In  the  RIPA  network  model,  because  of  the  underlying  metric  space,  the  “relevance” 
is  naturally  transitive,  i.e.  two  objects  relevant  to  the  same  thing  are  more  likely  to  be 
relevant  to  each  other.  Consequently,  the  RIPA  network  has  a  significant  higher  clustering 
coefficient  then  the  ER  or  BA  networks.  Fig. 3  shows  the  clustering  coefficients  of  the  RIPA 
network,  the  BA  network  and  the  WAN  network. 
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Figure  1:  Network  generated  on  unit  square  with  periodic  boundary  condition,  m  —  1, 
N  =  5000,  A  =  10.  The  circles  are  centered  at  the  locations  of  the  cities  and  the  radii 
represents  their  degrees.  The  background  color  indicates  the  logarithm  of  local  partition. 


4  Average  path-length 

In  the  area  of  complex  networks,  we  say  a  network  is  a  “small  world”  if  the  average  path- 
length  of  two  arbitrary  nodes  in  the  network  is  no  more  than  the  order  0(ln(7V))  as  the 
network  size  N  grows.  There  are  two  different  large  N  limits  of  this  network  model.  One  is 
the  non-extensive  limit,  for  which  the  metric  space  keeps  the  same  and  the  density  of  nodes 
increases  to  infinity.  The  other  is  the  extensive  limit,  for  which  the  density  of  nodes  keeps 
the  same  and  the  metric  space  extends  to  infinity.  In  the  latter  case,  an  equivalent  way  is 
to  keep  the  metric  space  the  same  and  rescale  the  metric.  For  instance,  on  the  unit  square, 
the  metric  d(x,y)  should  be  rescaled  as  d]\r(x,y)  =  \fNd(x,y),  so  that  the  average  density 
of  nodes  keeps  constant  as  N  grows. 

According  to  Fig. 4,  the  RIPA  under  non-extensive  limit  is  always  a  small  world.  The 
average  path-length  even  lightly  decays  as  N  grows.  This  observation  can  be  interpreted 
as  the  transportation  in  a  fixed  area  becomes  more  convenient  when  you  have  more  choices 
of  transition  points.  We  also  observe  that  the  RIPA  under  extensive  limit  is  a  small  world 
when  the  relevance  function  /  has  the  power-law  decay  (/(d)  =  d-2),  but  is  not  when  /  has 
a  exponential  decay  (/(d)  =  e~Xd ).  From  the  physics  aspect,  the  two  relevance  functions  are 
analogues  of  long-range  and  short-range  correlations.  So  this  observation  can  be  concluded 
as  the  RIPA  network  is  a  small  world  when  the  relevance  function  represents  a  long-range 
correlation. 
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Figure  2:  Power-law  degree  distribution  of  networks  when  m  —  1,  5,  N  —  5000, 10000,  20000, 
A  =  10. 
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Figure  3:  Clustering  coefficient  Casa  function  of  network  size  N.  RIPA1  for  m=3,  RIPA2 
for  ni=10. 
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Figure  4:  Average  path-length  L  in  RIPA  network  as  network  size  N  grows.  Red  plots 
are  for  the  RIPA  under  the  non-extensive  large  N  limit.  Blue  and  Green  plots  are  for  the 
RIPA  under  the  extensive  large  N  limit.  The  blue  plot  is  for  the  relevance  function  with 
power-law  decay,  the  green  one  is  for  the  relevance  function  with  exponential  decay. 
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The  following  theorem  give  a  criterion  when  the  RIPA  network  on  two-dimensional  space 
is  not  a  small  world. 

Theorem:  The  network  is  not  a  small  world  network  if  the 
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5  core-periphery  structure 

Core-periphery  structure  is  observed  in  several  real  world  complex  networks.  In  the  network 
with  such  kind  of  structure,  there  is  a  subnetwork  called  “core”  which  is  tightly  connected, 
and  the  complementary  subnetwork,  the  periphery,  are  fragmental  and  mostly  attached 
to  the  core.  A  significant  feature  of  the  core-periphery  structure  is  that  the  network  is 
vulnerable  to  the  attacks  on  the  core.  By  successively  removing  nodes  from  the  core,  the 
whole  network  will  quickly  fall  into  several  disconnected  parts.  The  Fig.??  shows  how  the 
giant  cluster  size  decreases  as  the  nodes  are  removed  in  the  descending  order  of  the  degrees. 
As  shown  in  the  figure,  the  BA  network  has  hubs  therefore  are  more  vulnerable  to  the 
attacks  on  the  high  degree  nodes  than  the  ER  networks,  but  it  still  has  a  high  threshold 
(about  0.5  in  the  figure)  when  the  giant  cluster  size  has  a  fast  decay.  For  RIPA  and  WAN, 
however,  the  giant  cluster  sizes  both  decrease  quickly  at  the  very  beginning.  So  the  RIPA 
network  model  captures  the  core-periphery  structure  as  in  the  WAN  network. 


Figure  5:  Giant  cluster  size  g  after  removal  fr  fraction  of  nodes  in  a  descending  order  of 
degree. 
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5.1  RIPA  on  the  Sphere 

Similarly,  we  implement  the  RIPA  on  the  sphere  where  the  metric  is  given  by  spherical 
distance.  As  shown  in  Fig. 6,  the  .  Interestingly,  some  qualitative  behavior  is  quite  stable 
in  the  simulations,  eg.  the  spherical  angle  between  the  first  two  largest  hubs  are  usually 
around  0.67T  —  0.77T.  However,  this  network  is  not  exactly  the  case  of  the  earth.  On  the 
earth,  city  can  only  locate  on  the  continents,  and  the  metric  is  not  uniform.  The  oceans, 
rivers  and  mountains  may  affect  the  effective  distance. 

Figure  6:  Network  generated  on  sphere  with  mn  —  3,  N  —  5000,  A  =  5.  Two  plots  are 
the  views  of  the  same  sphere  from  different  angles.  The  color  (brightness)  indicates  the 
logarithm  of  local  partition. 
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